Method and system for generation, storage and distribution of omni-directional object views

ABSTRACT

Image acquisition refers to the taking of digital images of multiple views of the object of interest. In the processing step, the constituent images collected in the image acquisition step are selected and further processed to form a multimedia sequence which allows for the interactive view of the object. Furthermore, during the Processing phase, the entire multimedia sequence is compressed and digitally signed to authorize it viewing. In the Storage and Caching Step, the resulting multimedia sequence is sent to a storage servers. In the Transmission and viewing step, a Viewer (individual) may request a particular multi-media sequence, for example, by selecting a particular hyperlink within a browser, which initiates the downloading, checking of authorization to view, decompression and interactive rendering of the multi-media sequence on the end-users terminal, which could be any one of a variety of devices, including a desktop PC, or a hand-held device.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to imaging and more specificallyto imaging of objects.

2. Brief Description of the Prior Art

A common obstacle to the sale of items on the Internet is that it isdifficult for consumers to gain an understanding of thethree-dimensional characteristics of an item being contemplated forpurchase. In the conventional retail store environment, the consumeroften has the opportunity to look at an item of interest from multipledirections and distances. This in-person experience allows the consumerto understand and appreciate the physical shape and detail of the objectmore closely and to be assured that the item they are purchasing meetstheir expectations in terms of quality, desired feature set andcharacteristics. On the Internet, achieving a similar level ofinteractive product inspection and evaluation by a consumer is much moredifficult, since the browsing experience of most Internet consumers isprimarily a two dimensional one e.g. looking at pictures or reading textdescriptions of items. While this gives a reasonable representation ofthe object, more complete interaction which rivals that available in aconventional retail environment can be desirable. Such an experiencewould reduce the barriers to purchasing over the Internet that mighthave resulted due to the user having an incomplete picture which islimited to the 2-D static photographs, non-interactive video,illustrations and textual descriptions of the item being contemplatedfor purchase. A system and method which would allow for a multi-viewinteractive experience of items would be desirable to consumers andvendors alike.

Images are useful in depicting the attributes of scenes, people andobjects. However, the advent of digitized imagery has demonstrated thatthe image can become an active object that can be manipulated digitallyvia computer processing. Images can also become interactive entities,where clicking on different portions of an image can yield differentprocessing outcomes which could come from a variety of multimediasources, such as sounds, animations, other images, text, etc. Forexample, image maps are used often within the wide world web allow for alarge of amount of information to be displayed in an intuitive graphicalfashion to a user allowing for “direct manipulation” GUIs, By clickingwithin different portions of the image, different outcomes can betriggered, such as loading of web pages that are linked to thoseportions of the images, or “roll overs” which dynamically displayadditional text describing a button which may be selected. For example,a 3D effect can be achieved by acquiring a set of images of a rotatingobject in sequence in then allow for the smooth sequential selection ofthose images by the use of a GUI element such as a slider, thus givingthe appearance of 3D rotational object motion. The images may be fromreal scenes, or synthetically generated using computer graphicstechniques. This multimedia program may be in the form a browserembedded application or “Applet” as depicted in FIG. 1.

Additionally, besides linking to other images or web pages withmultimedia content, different input actions to a multimedia programs(e.g. an internet browser) can cause the selection of different images,such as causing the display of magnified portions around the areaclicked and so forth.

Additionally, with the advent of digital image processing programs aimedat the digital manipulation and enhancement of digitized images, it hasbecome possible for multimedia authors to easily and intuitively buildimage-based interactive programs which can be run within any webbrowser. For example, multimedia authoring programs which run on the PC,such as Adobe LiveMotion or MacroMedia Director™ allow developers tocreate content for CDs, DVDs and the web.

The system described here enhances and extends existing systems andprocesses for digital image editing and multimedia authoring in a numberof novel ways which are described below.

It is presently difficult to generate interactive multiple view imagesof objects for a number of reasons. Stand-alone Software applicationsfor creation of interactive object viewing are complex it install anduse, and are expensive to purchase. For example, applications such asMGI Photo Vista 3D Objects, VR Objectworx and Quicktime VR AuthoringStudio are complex to install, and difficult to master for anon-technical audience. We present an self-installing application whichruns inside a web-browser, and is easy to use, even for the technicallyuntrained.

Another drawback of existing programs for the creation of interactivemultiple view images has been the high up front cost of purchasing theseapplications, since they are sold on a licensing basis which presumes anunlimited number of images may be created for each granted license. Wepresent a methods and architectures which permit the software to befreely distributed and licensed on a pay-per-use basis usingcryptographic techniques to enforce the terms of the licensing.

An additional impediment faced by the prior art in interactive imagegeneration is that expensive special purpose rotating stages must bepurchased to rotate the object to be photographed. This additional costis such that many individuals that might desire to generate interactiveimages are currently prevented from doing so by the high costs andcomplexity of purchasing and installing the electromechanical systemsrequired to acquire such images. We provide several ways which eliminatethese barriers by providing a software only means to acquire saidimages, by enabling the use of extremely low cost spring wound rotatingstages, and by providing a self-service kiosk with all of the necessaryhardware elements to carry out the image acquisition and processingnecessary to achieve the generation of the interactive images.

In the current state of the art of multi-media, the notion of themulti-media player refers to an application program which can interpretmedia objects and multi-media programs in order to present a multi-mediapresentation to an end-user. The player can operate in an open loopfashion, deterministically presenting media objects without end-userintervention, or may be interactive, where the presentation sequence ofthe media objects may be controlled by certain user inputs.

In general, most multi-media systems, such as MacroMedia's Flash system,require a native multi-media player plug-in, which interprets files inFlash Format that contain the specific multi-media instructions andobjects that will carry out the presentation. The Flash player iswritten in the native instruction set of the computer that is renderingthe multi-media presentation. Since the processor cannot nativelyinterpret the multi-media sequence, this creates the pre-requisite thatthe user have installed the corresponding media player on their PC inorder to be able to play the media sequence. The downloading andinstallation of the media player can impose an inconvenience on theend-user, since the media player can be large and take a long time todownload, and installation processes for media players can be errorprone. It is therefore desirable to avoid this step. We describe asolution that uses a very small special purpose media player for ourmulti-media sequences which downloads in an almost instantaneous mannerand is written in the Java programming language bytecode. Since themajority of Web browsers come with the Java bytecode interpreterpre-installed, the end-user can enjoy the multi-media sequences whileavoiding the download of a full media player. The Java™ programminglanguage provides a basis for a predictable run-time environment(Virtual Machine) for programs which operate on computer havingdiffering processor instruction sets and operating systems. A number ofmajor internet browsing programs provide a Java run-time environmentwhich allows for programs compiled into Java byte code to execute withinwhat is commonly known as an applet. An applet is a small program whichruns within the context of a larger application such as a web browserand can execute inside of a standard web page as depicted in FIG. 1. Theuse of the Java Run Time eliminates the need for the installation of aspecialized plug-in program to allow for the extension of thecapabilities of the web browser, such as for, example, the MacroMediaFlash Player Plug-in. Instead, an applet written in the Java languageand compiled into byte code may be used to add new programmatic feature(such as multimedia capabilities) to a browser. Other languages such asMicrosoft's C# may serve as well for the implementation, replacing Java.Alternatively, Javascript may be used to animate the 3D sequences andprovide interactive user input and reactivity if desired.

SUMMARY OF THE INVENTION

In one embodiment, the main steps in the operation of the system alongwith the associated hardware system components are indicated in FIG. 2.

The system processing flow can be broken into four main phases:

1. Image Acquisition

2. Processing

3. Storage

4. Transmission and Viewing

The key hardware elements for realization of the system are:

1. Digital Photographic or Video Camera

2. Personal Computer (PC)

3. Application Host Server

4. Storage and Caching Servers

5. Viewing PCs

Image acquisition refers to the taking of digital images of multipleviews of the object of interest. In the processing step, the constituentimages collected in the image acquisition step are selected and furtherprocessed to form a multimedia sequence which allows for the interactiveview of the object. Furthermore, during the Processing phase, the entiremultimedia sequence is compressed and digitally signed to authorize itviewing. In the Storage and Caching Step, the resulting multimediasequence is sent to a storage servers. In the Transmission and viewingstep, a Viewer (individual) may request a particular multi-mediasequence, for example, by selecting a particular hyperlink within abrowser, which initiates the downloading, checking of authorization toview, decompression and interactive rendering of the multi-mediasequence on the end-users terminal, which could be any one of a varietyof devices, including a desktop PC, or a hand-held device.

In the image acquisition step, image acquisition can be done by avariety of means, three of which are illustrated in the FIG. 2. Forexample, using a hand held CAMERA, VIDEO & PC, and hold the object ofinterest in a fixed position the user may circle the object and take anumber of images which capture different aspects (directional views) ofthe object (See FIG. 3). These images are temporarily stored in thememory of the digital camera. Alternatively, by using a camera, such asa camera or video recorder, as depicted in STABILIZED CAMERA, AND/ORVIDEO, AND/OR ROTATING STAGE and PC, and placing the object on therotating stage, and taking images at differing time intervals as theobject rotates, a sequence of different aspects of the object can becaptured. The camera may be stabilized either electronically, or by useof a tripod. Alternatively, the object can be manually rotated through anumber of positions, and images acquired at the different objectpositions. In another image acquisition embodiment, a public situatedSELF-CONTAINED ROTATING STAGE KIOSK containing, an illumination system,camera and rotating stage, can be used as a vending system, into whichthe object of interest is placed, and the kiosk automatically takes aseries of images.

In the Processing Step, the images captured in the previous step areprocessed using a Processing application. The processing applicationpermits all of the captured images illustrating the differing aspects ofthe object to be viewed, selected and aligned and then composed into aninteractive multi-media sequence. This application may run stand-aloneon the PC, in a shared mode, between a host computer and the users PC,or completely on the host, with the users PC acting as a thin client(see FIG. 4 and FIG. 5). The application provides a means for thebuilding and preview of the finished sequence. Once the author issatisfied with the results of the sequence, the sequence is thencompressed, encapsulated and authorized for distribution by the use ofan authorizing digital signature.

In the storage step, the resulting sequence can be stored on a storageand distribution server which serves as a repository for the access ofthe finished multi-media sequences by the viewing public. The storagerepository may be mirrored and distributed via a number of well knownweb caching mechanisms to improve the access time for the viewing publicand distribute the load to the server

Finally, in the Transmission and Viewing Step, member of the viewingpublic request specific multi-media sequences and view applet (see FIG.1), for example, by selecting specific hyperlinks embedded within HTML,which triggers the transmission of the multi-media sequence to theviewing individuals terminal (whether a PC or handheld) where thesequence is authorized by checking of the digital signature decompressedand made available for interactive viewing.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1: Illustrates the Java Viewing Applet Embedded in a Browser Window

FIG. 2: Is an overview of the System for Generation, Storage andDistribution of Omni-directional Object Views.

FIG. 3: Illustrates the process of acquisition of images around theobject of interest using an image acquisition device.

FIG. 4: Illustrates the Network Based Distributed Media Object (Image)Editing and Multimedia Authoring Implementation with a “thin client”.

FIG. 5: Illustrates the Network Based Distributed Media Object (Image)Editing and Multimedia Authoring Implementation with a “thick client”.

FIG. 6: Illustrates the image acquisition system for the camera, tripodand rotating platter.

FIG. 7: Illustrates the Self Contained View Acquisition System Kiosk

FIG. 8: Illustrates the Cylindrical Turntable Scanner Kinematicsrealization for the image acquisition system.

FIG. 9: Illustrates the Spherical Kinematics realization for the imageacquisition system.

FIG. 10: Illustrates the Non-Articulated View Acquisition Platformrealization for the image acquisition system.

FIG. 11: Illustrates the encapsulation of the media player applet andmulti-media object sequence.

FIG. 12: Illustrates the Self Contained View Acquisition System Kioskhardware blocks.

FIG. 13: Illustrates the Self Contained View Acquisition System Kiosksoftware modules.

FIG. 14: Illustrates the image editing process for generation ofinteractive multimedia sequences.

FIG. 15: Illustrates The Multimedia Authoring Cycle for generation ofinteractive multimedia sequences.

FIG. 16: Illustrates the Editing and Authoring Tools, Objects and WorkFlow for generation of interactive multimedia sequences.

FIG. 17: Illustrates State Diagram for View Applet

FIG. 18: Illustrates the storage database, transmission and playback ofthe interactive multi-media sequence.

FIG. 19: Illustrates the Direct Viewing of the media sequence from ViewHost.

FIG. 20: Illustrates the image differencing process for identificationof the image object of interest.

FIG. 21: Illustrates the background masking process for using the imagemask.

FIG. 22: Illustrates the Foreground/Background Histogram for automaticthreshold determination.

FIG. 23: Illustrates the Dilation Shells of Selection Mask.

FIG. 24: Illustrates the Alpha Assignments for Dilation Shells.

FIG. 25: Illustrates a raw acquired image of the object of interest.

FIG. 26: Illustrates the selection indicator the desired axis ofrotation of the object of interest for a first view.

FIG. 27: Illustrates the rotationally rectified desired axis of rotationfor a first view.

FIG. 28: Illustrates the selection indicator the desired axis ofrotation of the object of interest for a second view.

FIG. 29: Illustrates the rotationally rectified desired axis of rotationfor a second view.

FIG. 30: Illustrates the superimposition of the rotationally rectifiedfirst and second views.

FIG. 31: Illustrates the vertical translation rectification of the firstand second views.

FIG. 32: Illustrates the scaling rectification of the first and secondviews, using the scaling operator center coordinate indicator.

FIG. 33: Illustrates the final results of the rotation, translation,scaling rectification steps.

FIG. 34: Illustrates the perimeters of the convex intersection of theareas of views.

FIG. 35: Illustrates a rectangle inscribed n the intersection perimeter.

FIG. 36: Illustrates the maximum area inscribed rectangle for theintersection area of multiple views.

FIG. 37: Illustrates the unified crop boundaries for the set of images.

FIG. 38: Illustrates the final crop boundaries for the set images afterbalancing the left/right distance for the crop boundaries around theaxis of rotation.

FIG. 39: Illustrates the motion field object of the object of interestand static background.

FIG. 40: Illustrates the synthetic reticle for the alignment of therotating platform.

FIG. 41: Illustrates the video decompression sequence for receipt andstorage of the multimedia frame.

FIG. 42: Illustrates Spherical Coordinate Scan Pattern for object imageacquisition.

FIG. 43: Illustrates Geodesic Scan Pattern for object image acquisition.

FIG. 44: Illustrates the Spherical View Indexing Torus.

FIG. 45: Illustrates the Vertex indices for Geodesic Dome—Frequency2—Class 1 (Top View of Hemisphere).

FIG. 46: Illustrates the Registration of Unique Item Number: using Printon Demand Bar-Code

FIG. 47: Illustrates the On-Demand Printing of Unique Bar Code ID usingPrinter at User's PC.

FIG. 48: Illustrates the Point of Scan Key-board Correspondence of Itemwith Printed Bar Code and Acquisition, Encapsulation and Publishing ofItem.

FIG. 49: Illustrate the Hyperlinking to View Hosting Service.

FIG. 50: Illustrates the document object layout for the Javascript mediasequence presentation.

FIG. 51: Illustrates the images dynamically loaded when correspondingsections of the slider image map a selected via mouse.

FIG. 52 is a listing of a Javascript program which realizes themulti-media image sequence presentation.

DETAILED DESCRIPTION OF THE INVENTION

Referring now to the drawings in detail, a system in accordance with anembodiment of the invention includes a processing flow that can bebroken into the following four main phases, which are described in moredetail herein:

1. Image Acquisition

2. Processing

3. Storage

4. Transmission and Viewing

Picture Acquisition

In the image acquisition step, the constituent set of images making upthe multi-media sequence are taken. This can be accomplished using avariety of different means, including the use of a Hand Rotated Objector Camera, Rotating Stage, or Self Contained View Acquisition Kiosk.These techniques are described in more detail below.

Hand Rotated Object or Camera

In this mode a set of pictures are taken in one of two modes using ahand-held camera, either video or still. In the first mode, the objectis held fixed and the camera is moved around the object while a sequenceof images is taken, all the while keeping the object centered manuallyin the camera viewfinder apparatus. Alternatively, the handheld cameramay be held approximately stationary and the object rotated in place byhand. At each new object rotational position, a new exposure is taken.This is illustrated in FIG. 3. Here positions 1 through 4 illustrateexamples of different directions and ranges from which the images may beacquired using an image acquisition device.

Rotating Stage

A problem faced by individuals desiring to acquire 3-D interactiveimages is the expense of hardware and software need to acquirehigh-quality rotational interactive sequences of objects with thebackground suppressed and or composited. An alternative cost-effectiveprocedure for achieving high quality object sequences is to use aslow-speed rotational table along with a time-lapse mode with aconventional digital camera. In the preferred embodiment a low costspring wound rotational mechanism can be used, although dc and acvariable speed electrical motors can also be used. The acquisition setupis illustrated in FIG. 6, which has the image acquisition device, whichcan acquire and store the images, the rotating platter, which holds androtates the object, and the tripod, which holds the camera steadybetween frame acquisitions.

In this mode, the object is placed on a rotating stage. The stagemechanism may be manually actuated, electrically actuated via an open orclosed loop means, or spring actuated via wind-up. The stage is set torotate while the camera is held fixed, either manually, or via tripodand a succession of exposures are taken at specific time or angleintervals. If closed loop control of the rotating stage is possible,then the rotating stage may be commanded to specific positions by thePC, and exposures taken upon completion of the motion. If the platformis moving in open loop fashion, and the platform rotational velocity indegrees/second is known, then the camera may be programming toautomatically gather exposure at a given time interval that yield agiven change in table rotational angle between exposure points.

The following procedure is used while holding the ambient scene lightingand camera exposure approximately constant between image acquisitions:

1. (Optional if “Automatic Masking via Background subtraction laterdescribed herein is used) The rotating stage and background arephotographed without the desired object to yield a digital image(P₀).

2. The desired foreground object is put on top of the slowly rotatingturntable.

3. A sequence of images are taken in time lapse mode.

If shots are desired every n degrees of product rotation, then thetiming interval between shots is set to n/(table rots/minute*360d/rot*1minute/60 sec)=n/(degrees/sec) The total number of shots to be taken isN=int(360/n shots). (P₁ . . . P_(N))

The slow speed rotational table and editing/authoring applications maybe “shrink wrapped” together to provide a complete 3D image acquisitionsolution to end-users which may be combined with the cryptographic basedlicensing techniques described in this document, or if desired, otherwell known license management technique may be used as well giving asimple and low cost solution for those desiring a low cost andconvenient method for forming interactive image sequences with 3Dcharacteristics in particular.

Self Contained View Acquisition System Kiosk

Some individuals may not wish to purchase and install the requiredelements for image acquisition, as described herein for reasons ofconvenience and expense. It is desirable to offer a vending system,which incorporates the necessary elements for carrying out the imageacquisition in a simple self-service manner. Such a device can put inconvenient public locations such as retail stores that would permit thedisplayer to avail himself of the scanning and posting capabilities ofthe machine by bringing the object of interest with him to thatlocation. The automatic capabilities of the machine include the processof automatically acquiring, processing and publishing theomni-directional and distance views.

A Self Contained View Acquisition System Kiosk (See FIG. 7) whosepreferred embodiment is described in herein is connected to the HostApplication Server of FIG. 2 which has the function of storing andsending the application Program to the PC at the request of the PC. Inthe first step, the object of interest can be placed on a computercontrolled turntable (See FIG. 8) and camera pointing system withcomputer controlled adjustable camera parameters such as zoom, focus andpan-tilt and the turntable commanded to rotate to a succession ofrotational angles, for each of which a digital image is acquired.

Once the views are acquired and temporarily stored on the PC, they canbe adjusted and formatted into a media object using the ProcessingApplication using any one of a number of different formats which aresuitable for the economical storage and transmission of image sequences.A description of potential encodings is described later.

These view sequence files are transmitted and arrive at the HostApplication Server where they are indexed and stored in the Storage andCaching Server(s) for future retrieval. A view sequence is cataloged byunique identifier which allow for the particular view sequence to beretrieved and viewed subsequently from the database within the Storingand Caching Server.

Embodiment for Self Contained Scanner

It important for the displayer of goods to be able to easily and rapidlygenerate the omni-directional and omni-distance views of the object. Thedisplayer of goods on the internet should be able to easily andconveniently generate omni-view sequences of objects and publish andlink them. In a preferred embodiment, a kiosk such as illustrated inFIG. 7 can be used in a self-service fashion. The unit, which iscountertop mounted, has a turntable access door which can swing open andthe user can place the object to be acquired inside of the housing andclose the door. The user then places the printed bar code label in frontof the bar code acquisition unit which captures the unique objectidentifier. The user then may use the touch screen on the visual displayto activate the collection of the view sequence. Once the view sequencecollection is complete, the user may interactively preview the scanusing the visual display.

Kinematic Configurations for Self-Contained Scanner

A number of different kinematic configurations for the scanner arepossible in order to accomplish the acquisition of views from differentdirections. FIG. 8 illustrates the kinematic articulation for thecylindrical turntable scanner configuration. In particular, the cameraelevation degree-of-freedom (DOF) and pitch DOF, as well as theturntable rotation DOF are actuated and computer controlled.

An alternative view embodiment which constrains the view direction tothe origin (center of rotation of the turntable) is illustrated in FIG.9. In this embodiment, the PITCH DOF and YAW DOF correspond to pan andtilt relative to the current ELEVATION DOF along a given arc support.The Turntable ROTATION DOF is the same as in the cylindrical kinematicconfiguration.

If desired, a number of cameras may be laid out in a semi-circularconfiguration as illustrated in FIG. 10. While more restrictive, thisconfiguration allows for the elimination of any moving parts andsimultaneous acquisition in the view sequence acquisition system at theexpense of the need for more cameras. Additionally, the set of camerasmay be mounted on a serial articulated linkage such as a spiral woundgooseneck, and positioned arbitrarily along a given trajectory to form aparticular sequence of views.

Hardware Modules for Self Contained Scanner

In particular such a kiosk would have the following hardware componentsas illustrated in FIG. 12. A digital camera would be utilized to acquiredigitized high resolution color or black and white digital images of theobject. The camera would have electronically adjustable gain andintegration time which would be achieved by use of a camera interfacemodule. The camera would be fitted with an computer controllableactuated lens would allow for adjustment of zoom, focus and iris. Thecamera would be positioned on a camera platform which would allow forcomputer control of the camera height and pitch. A computer controlledturntable (rotational positioner) would allow for computer command ofturntable rotational angle. The actuated lens, camera platform andturntable would all be controlled by an actuator controller module. TheIllumination Control Module would serve to control the illuminators inthe system. The micro-controller board would be responsible for theoverall system coordination and control of modules The bar codeacquisition system would be used to scan and extract the coded uniqueobject identifier alphanumeric strings which the displayer would bringto the kiosk to identify the object(s) that they are scanning. The barcode acquisition system would be controlled and communicated to via thebar code acquisition interface module. The display controller modulewould generate any video needed for the graphical user interface andview sequence preview which is displayed on the visual display unit (anLCD or CRT in the preferred embodiment). The network interface modulecarries out the communication to the network connection which access theapplication server computer host.

The following software modules would be executed by the PCmicro-controller board as illustrated by FIG. 13. The executive isresponsible for the overall system sequencing, coordination and controlof modules. The GUI module is responsible for rendering graphical screenelements and managing user inputs and utilizes the hardware capabilitiesof the display controller module, the visual display unit and optionallythe keyboard or touch screen. The network communications protocol stackmanages communications between the kiosk and the Host Application Serveras illustrated in FIG. 2. The image acquisition module uses thecapabilities of the camera interface module to acquire digital images ofthe object. The image quality evaluation module processes the acquiredimages and image sequences and computes figure ground separation of theobject being view sequenced, determines the extents of the object inimage space, and selects zoom, focus, iris, gain and exposure values forthe camera lens and camera to achieve a high quality view sequence. Theselected actuator parameter are used by the executive to actuate thesystem actuators via the lens Control Module, turntable control module,camera control platform, and camera platform control module whilesynchronizing image acquisitions at the appropriate points. The LensControl Module, Turntable control module, camera control platform, andcamera platform control module in turn use the services of the actuatorcontrol module to achieve the actuator control and motion. The resultingcomplete sequence is the processed by the sequence compression andformatting module. Once the sequence is complete and accepted by theuser, the executive uses the network communications protocol stack toestablish a session with the application server and then transmits theview sequence along with the unique object identifier which is acquiredvia the bar-code acquisition control module.

Processing Application

Overview of the Multi-media Authoring Process

As indicated in FIG. 2, the system consists of a distributed set ofprocessing elements (in the preferred embodiment these aremicroprocessor based computing systems) connected via a communicationsnetwork and protocol. The user desiring to edit images or creatingmultimedia programs uses a client processing element with a display inorder to modify images and generate multimedia programs. Taking theelements of FIG. 2 and redrawing them yields an embodiment of such adistributed system is illustrated in FIG. 4, The client computingelement may either be a system of low computing capability which merelyfunctions as a display manager as indicated in FIG. 4, or fully capablehigh computing power workstation as indicated in FIG. 5.

The term author refers to the person involved in the creative editing,enhancement of the images and/or the authoring of the multimedia programwhich uses those images to yield an interactive multimedia presentationto the end-user of the multimedia program.

In general to make an interactive multimedia object program, two majorfunctions are needed: digital image processing and enhancement, andcreation of the multimedia program which operates on the media objects,such as the digital images, and handles interpretation of user input tocreate the overall multi-media presentation to the user. The firstfunction ensure that the properties of the images used in themulti-media program meet the requirements of the author. This iscommonly known as digital image enhancement and editing and the methodsfor our system in this regard are described later herein. For examplethe author may modify the resolution, sharpen the image, change thecolor palette etc., using a number of well known image processingoperations that are common in the prior art. Examples of theseoperations include contrast enhancement, brightness modification, linearfiltering (blurring/sharpening) and thresholding. The select, edit andreview cycle for the image processing is depicted in FIG. 14. The secondfunction, the multi-media programming function, consists of writing themultimedia program (or applet), which uses these images along with otherinput elements media elements such as sounds (the Media objects). Theresulting program responds to the end-users inputs by generating outputmultimedia events. Examples of multimedia events include generation,selection and rendering new images, video sequences, playing digitizedsounds etc. in response to these events. The multimedia authoring cycleis illustrated in FIG. 15.

A generic overall work flow for the creation of multimedia content isdepicted in FIG. 16. Within this overall work flow, a number ofimplementation and embodiments are possible. For example, the images maybe uploaded to a remote server and processed at the server, with theresults of the processing being sent back to the client so that theauthor may see them as in FIG. 4. Alternatively the editing andauthoring programs which carry the application and processing of localimages and authoring of multimedia programs may be downloaded from aserver and used to edit images and other media objects local to theclient computer and to form multimedia programs as illustrated in FIG.5. Furthermore, this application may execute as part of the web browsingprogram by anyone of a number of well known techniques for extending thefunctionality of browsers, such as plug-ins and Microsoft ActiveX™extensions. This allows the users to access the application within theirweb-browser and within a specific web page, rather than within aseparate desk top application.

Specifically, the editing and authoring program may be encapsulated asan extension to a web browsing application by being packaged in the formof a Microsoft COM or ActiveX component, which may be downloaded ondemand when a particular page of HTML hosted by the application serveris accessed. Furthermore, this application may be signed by theApplication's creator using a trusted digital certificate. Theapplication is small in size and can download and install quickly.

Applet Media Player

In this context, the applet is used to manage the rendering and playbackof multimedia objects such as images and sounds. These multimediaobjects can either be stored on a web server, or encapsulatedmonolithically with the applet in an archive, such as a Java Archive(JAR) file. Alternatively, these multimedia programs may be encoded in aparticular standardized or multimedia script format such as MacromediaFlash Format.

As illustrated in FIG. 19, once the view sequence file has been createdand stored in the database it may be retrieved via a command to theStorage and Caching server and sent to the viewer's client computerwhere the a viewer application or applet interprets, unpacks and rendersthe omni directional views in an interactive fashion. If a user wishesto view a particular interactive omni-directional view sequence, s/hemay enter retrieve the sequence of interested from the database to theirclient computer using the above mentioned unique identifier. Once theview sequence has been retrieved and is available at the client, theviewer may view the sequence using an interactive viewer applicationprogram (applet), which allows for the interactive selection of views ofthe object of interest. The applet consists of an interactive set ofon-screen controls which when modified by the viewer, can allow fordifferent views of the object to be selected. In particular by rapidlyand smoothly scrolling through a continuous set of views the appearanceof smooth object rotation may be achieved and a three-dimensional effectachieved, The state diagram for the viewing applet is depicted in FIG.17.

In particular, the Application Server may host the image, but the imagemay be referenced and be indirectly included in the merchants web sitevia a URL reference in the merchant's web-site. A similar mechanism maybe used for a particular posting in a classified ad, or in an on-lineauction placement.

An example of the output of a Java Language based viewer applet isillustrated in FIG. 1. The user can interactively slide the slider bargraphical user element to the left or right to cause the viewed objectto rotate to the left or right by selection of appropriate views in theview sequence.

Authorization of Content for Distribution and Playback

As mentioned in the introduction, it is desirable to enable the“pay-per-use” distribution of the software application, which permitsthe creation of the interactive multi-media sequences. In order toensure the proper licensing of the resulting interactive program, it isdesirable that the multimedia program or applet be bound to the set ofmedia objects through the use of a digital signature. The digitalsignature can be used to check for the integrity of the multimediaobject sequence and to enforce the binding of a unique applet or set ofapplets to a set of multimedia objects and to enforce copyrights etc.This is described in the following section.

In order to enforce the proper consideration (payment) in exchange forlicensed use of multimedia programs cryptographic techniques are used toensure that the multimedia sequences and objects generated have beenproperly generated in an authorized fashion. In particular,authorization for the interactive viewing of a sequence can beaccomplished by checking that a uniquely generated multimedia program isbound cryptographically to a particular set of media objects which ituses as part of its multimedia interactive program.

Secondly, the particular set of media objects can be authenticated(independent of the player) as having been bound together and processedin an authorized fashion which guarantees that payment has been made. Ifthe authorization for the collection of media objects fails, then theplayer will not play the multimedia presentation. This ensures that userof the multimedia program will only use the media program when properlylicensed by the entity which controls the multimedia authoring andimaging editing capabilities.

Binding an Ordered Set of Multimedia Objects to an Applet UsingSymmetric Cryptographic Algorithms

The Notation used in the exposition is as follows:

Let the message M={O₁, . . . , O_(N)}, be the ordered concatenation ofthe set of multimedia objects as encapsulated.

E_(k)(M) is defined as the encryption of Message M using key k with asymmetric key algorithm e.g. DES 56 bit key.

D_(k)(M) is defined as the decryption of Message M using key k with asymmetric key algorithm e.g. DES 56 bit key.

H(M) is defined as the secure hash of message M using for example MD-5Algorithm, although any one of a number of proven secure hash algorithmwill suffice.

S=S_(k)(M) is defined as the digital signature of the secure hash ofMessage M or shorthand for E_(k)(H(M)), such as using the NIST DESAlgorithm in Cipher Block Chaining mode.

V_(k)S) is defined as the validator of the signature S of M or shorthandfor V_(k)(M)=D_(k)(E_(k)(H(M))=? H(M). where H(M) can be independentlycomputed by the validation computer since it is a well known hashfunction.

Signature w/Non Encrypted Content

In order to bind the applet viewer to a particular multimedia sequence,a symmetric encryption key is embedded in the viewing applet. This key,k is used as the basis of binding the multimedia object sequence to anapplet which can view it. The embedding of the key can be accomplishedin a variety of different ways, we describe two approaches which can beused in the preferred embodiment. In the first approach, the mediaplayer applet byte code and the key file encoding the encryption key kare inserted into an archive such as a Java archive (JAR) file as isillustrated in FIG. 1. An alternative approach is to insert the keyvalue into the Java source code corresponding to the media player appletcode and then compile the source code into the Java byte code which hasthe key embedded.

The encapsulation set consists of applet A(k) with key k embedded withinit and M, the ordered sequence of multimedia objects, and S, thesignature of the sequence M. This is described notationally as{A(k),M,S_(k)(M)}. It is also possible to split apart the archive intothe applet and media sequence: {A(k)}{M,S_(k)(M)} where {A(k)} is on onecomputing system and {M,Sk(M)} is on another computing system. It ispreferable to superencrypt the key k with another embedded key, k2, tomake it more challenging to extract the key k. The superencryption keyis embedded in the applet as well.

The processing sequence is as follows:

The signing k is generated within the client-side application

The client computes S_(k)(M)

The client sends K and S_(k)(M) to the server.

The server creates A(k)

The client sends M back to the Application Hosting Server.

The application and hosting server creates the encapsulation{A(k),M,S_(k)(M)} and stores it in the storage and cacheing server.

Signature w/Encrypted Media Set Encapsulation

If it is desired to sign and encrypt the contents of the message M inthe encapsulation, the following items can be generated{A(k,S),E_(k)(M),S_(k)(M)}, where E_(k)(M) represents the encryptedmedia object M.

The processing sequence to generate these items is as follows:

The signing k is generated within the client-side application

The client computes S_(k)(M)

The client encrypts M, yielding E_(k)(M).

The client sends K and S_(k)(M) to the server.

The server creates A(k)

The client sends E_(k)(M) back to the Application Hosting Server.

The application and hosting server creates the encapsulation{A(k),E_(k)(M),S_(k)(M)} and stores it in the storage and cacheingserver.

Playback

Checking Authorization for Playback (Unencrypted Media)

When the media program is requested, the storage and cacheing serverretrieves the matching Applet as indicated in FIG. 18, which results inthe data bundle consisting of {A(k),M,S_(k)(M)} arriving at the end-usercomputer.

Upon receipt of the bundle is split into the Applet with embedded keyA(k), the media sequence M, and the digital signature S_(k)(M).

The applet begins and execution and carries out the following steps:

1. Computes the secure hash H over M, H(M)

2. Computes the Validator H′ of the appended media sequence M, whereH′=D_(k)(S_(k)(M))=D_(k)(E_(k)(H(M))).

3. If H′=H(M)=S_(k)(M), then the computed hash and the decrypted securehash match. Then message signature is judged as valid and the sequenceis displayed, else the applet will not execute the interactive displayof the media objects.

Playback (Encrypted Media)

When the media program is requested, the storage and caching serverretrieves the matching Applet as indicated in FIG. 18, which results inthe data bundle consisting of {A(k), E_(k)(M),S_(k)(M)} arriving at theend-user computer.

Upon receipt of the bundle is split into the Applet with embedded keyA(k), the media sequence M, and the digital signature S_(k)(M).

The applet begins and execution and carries out the following steps:

Upon receipt of the encapsulation for encrypted media during executionthe following sequence occurs:

1. Applet A uses its embedded key k for decrypt the sequence E_(k)(M),yielding the original plaintext multimedia sequence M=D_(k)(E_(k)(M)).

2. Applet A computes the secure hash H over M, H(M)

3. Applet A computes the Validator of the Appended Media SignatureH′=D_(k)(S_(k)(M)).

4. If H′=? H(M)=S_(k)(M) (the computed hash and the decrypted securehash match) then message signature is judged as valid and the sequenceis displayed, otherwise the applet will not execute the interactivedisplay of the media objects.

Single Applet or one to few (per customer key) Sequence Validation viaembedded symmetry.

The key k embedded in the applet can be a universal key, where allgenerated applets contain it. However, if this key is compromised, thennew sequences can be generated that will work with applets. Optionally,a “customer key” can be allocated for each entity doing business withthe applet generation service. In this case, only that customer'sapplets will be “cracked”, but the key will not be able to generatesequences that work with other customers applets. However, once anapplet is “cracked” it can be published along with the key and signingalgorithm and allow other to create view sequences out of licensing.

Next, another approach is described below using public key cryptographywhich is similar in spirit to this approach, but avoids embedding asymmetric key in the applet which could potentially be compromised, thuscompromising licensing of all sequences with the common key. Binding anordered set of multimedia objects to an applet using Public KeyCryptographic Algorithms

An alternative approach is to use a public key approach where there is a“company” public key which is well known and published, signed by acertificate authority and also embedded in the applet A(k_(pub)) whichis universally distributed (at least in a large number of applets) and acorresponding private key K_(priv) which is kept secure andconfidential.

Public Key Signature w/Non Encrypted Content

In the sequence creation process, the following steps occur:

1. The client creates a secure hash H(M) of M the media sequence andH(M) is sent to the application hosting server.

2. The client sends M back to the Application Hosting Server.

3. The application hosting server then uses the private key k_(priv) toencrypt H(M) yielding E_(kpriv)(H(M)).

4. The server creates A(k_(public)), an applet with the public keyembedded within it.

5. The application and hosting server creates the encapsulation{A(k_(pubic)),M, E_(kpriv)(M)} and stores it in the storage and cachingserver.

Public Key Signature w/Encrypted Content

In the sequence creation process, the following steps occur:

1. The client creates a secure hash H(M) which is sent to the server.

2. The client creates a symmetric key K which is to be used to encryptthe media sequence.

3. The client encrypts M, yielding E_(k)(M).

4. The client sends E_(k)(M).back to the Application Hosting Server.

5. The server then uses the private key k_(priv) to encrypt H(M)yielding E_(kpriv)(H(M)).

6. The server creates A(k_(public), k), an applet with the public keyembedded within it, as well as the media decryption key.

7. The application and hosting server creates the encapsulation{A(k_(pubic),k), E_(k)(M), E_(kpriv)(M)} and stores it in the storageand caching server.

Checking Authorization for Playback (Encrypted Media with Public Key)

When the media program is requested, the storage and caching serverretrieves the matching Applet as indicated in FIG. 18, which results inthe data bundle consisting of {A(k_(pubic)),M, E_(kpriv)(M)} arriving atthe end-user computer.

1. Computes H(M)

2. Computes H′=D_(kpub)(E_(kpriv)(H(M)).

3. If H′=H(M) then the computed hash and the decrypted secure hashmatch. Then message signature is judged as valid and the sequence isdisplayed, else the applet will not execute the interactive display ofthe media objects.

Checking Authorization for Playback (Encrypted Media with Public Key)

When the media program is requested, the storage and caching serverretrieves the matching Applet as indicated in FIG. 18, which results inthe data bundle consisting of {A(k_(public),k),E_(k)(M), E_(kpriv)(M)}arriving at the end-user computer. K the encryption key for the mediasequence may optionally be superencrypted by a static key embedded inthe applet byte code to make the defeating of the algorithm moredifficult.

1. Computes H(M)

2. Decrypts E_(k)(M), yielding M, using key k.

3. Computes H′=D_(kpub)(E_(kpriv)(H(M)).

4. If H′=H(M) then the computed hash and the decrypted secure hashmatch. Then message signature is judged as valid and the sequence isdisplayed, else the applet will not execute the interactive display ofthe media objects.

Billing

The above authorization and authentication techniques provide aconvenient means for billing and payment in exchange for creation ofmultimedia sequences.

The user can authenticate themselves to the applet generation server byproviding an authenticator along with S and k generated from theauthoring/editing program in the section above. If the authenticator forthe user is validated by the server (e.g. the password and userid are avalid combination) then the applet server charges the users accountappropriately for the requested service and goes ahead and creates theapplet. Payment may be by a credit card transaction, or by debitingcredits on record for that particular user using the payment processorillustrated in FIG. 5.

In an alternative embodiment, signed credits may be sent down to theclient station in a lumped set. The authoring application may be giventhe authority to generate applets and sign media sequences using thetechniques described in the previous sections. The signed creditsconsist of random numbers (Nonces) that are signed by the public key ofthe applet generation service. The client side generator validates thecredit using the local copy of the applet generators public key. If thevalidation succeeds, then the applet may be generated and media sequencesigned using the credit.

The credit file is encrypted using a symmetric key which is embedded inthe generator application which has a unique serial number. Keyagreement between the client-side and the server side can be done usingDiffie-Hellman key agreement. Whenever the client-side generator needsto generate a new applet it decrypts the file, reads the index for thelast used credit and increments and then validates the public keysignature of the next credit. If it succeeds, then it uses the nextcredit nonce as the key k for the generated sequence in the techniquesabove for authentication and authorization. The index in the file isupdated to point to the next record and the file is resigned using amessage authentication code and re-encrypted. Alternatively it may usethe public key signing approaches described in the previous sections.

Image Processing

Masking Techniques

The identification masking of background from foreground objects ofinterest is often desirable in photography, such as for example, incatalog photographs. Once the foreground and background are identified,a number of other image special effects are also possible. In additionto making of the background, a digital matteing processing can be donewhich generates a composite image. The composite image is composed imagesources from two or more images. Regions identified as being one type(e.g. background) are substituted with source images information fromanother image, while regions identified as another type (e.g.)foreground are not modified. This can allow for synthetic backgrounds tobe substituted with other desirable images. In the art theidentification of foreground and background has been done using avariety of means. For example it has been done manually by hand maskingtools within digital editing programs, which can be a tedious and timeconsuming process to do properly. Other common approaches employ usingcolored backgrounds which can be identified through computer or videoprocessing and automatically detected (Chroma-key techniques). HoweverChromakey techniques have the disadvantage of requiring large andcumbersome background backdrop of a particular color, which often mustbe changed to make sure the background color is of a particular shadethat is not contained in the foreground object of interest. We presenttwo techniques, image subtraction and motion segmentation which avoidthese inconveniences.

Automatic Background Removal Using Image Subtraction

Background Identification

In general, given two images, one with a foreground object and the otherwithout, the background areas which are taken under similar sceneillumination and camera setting, will have very similar color or grayscale values in the situation with and without the foreground object,whereas areas that contain the foreground object in one image, but notin another will have a large absolute difference.

This large absolute difference or vector difference magnitude willindicate the presence of a foreground object of interest. By selectingpixels which are above a relatively small threshold in terms of graylevel or color magnitude (brightness), a mask can be formed whichselects only foreground object pixels.

In the case where the background scene is complex and cluttered it isimportant to align the two images. This ensure pixel-to-pixelcorrespondence between the two images. If this is not done it may causeerrors. The alignment can be done in two ways, the first being tomechanically align the two images during the acquisition step by makingsure the camera is held fixed, such as on a tripod. The second way is toemploy electronic stabilization, either within the camera, to track andalign the background between two scenes, or after the acquisition, whereidentical background features in the two backgrounds can be matched, andthe backgrounds aligned using affine or other warping techniques.

In these document |P_(i)−P_(j)| refers to either the grey-scale absolutedifference or color space vector difference depending on whether theimage set is color or monochrome with out loss of generality.

The per-pixel gray-scale difference is defined asD(x,y)=|I₁(x,y)−I₂(x,y)| where D(x,y) is the pixel grey value in thedifference image D at location x,y, and I₁(x,y) and I₂(x,y) refer to thepixel grey value at the x,y coordinate in the input images 1 and 2respectively.

In the case of color images, the magnitude of the difference of the RGBvectors may be used as illustrated in FIG. 20. More specifically, letI_(R)(x,y), I_(G)(x,y) and I_(B)(x,y) be the R,G,B components of a pixelin an image at coordinates x,y, and let I_(RGB)(x,y) be the color vectorfor the pixel at coordinate x,y. Let D_(RGB) represent the color vectorat coordinate x,y in the color difference image. The color vectordifference is defined as D(x,y)=|I_(1RGB)(x,y)−I_(2RGB)(x,y)|, whereI_(1RGB)(x,y) and I_(2RGB)(x,y) represent the pixel-wise RGB vectors forinput images 1 and 2. Here the “−” operate represents the vectordifference operator, and the “|” represents the vector magnitudeoperator of a vector.

The background identification process may be automated using a sequenceof image processing steps as follows.

1. A picture P₀ of the scene without the foreground object of interestis digitized.

2. The foreground object is placed in the scene and another picture P₁is digitized.

3. A third synthetic image D₁, which consists of the pixel-wise absolutedifference |P₀−P₁| is formed.

4. D₁ is then thresholded automatically using an automated histogramderived thresholding technique

The resulting image is a binary mask image M₁ where all pixels above acertain magnitude are marked as “1” meaning foreground, otherwise theyare marked as a “0” for background.

The mask is applied by scanning each Mask pixel M₁ (x,y).Whenever themask pixel takes on the value “0” (background) the corresponding thepixels at coordinate (x,y) in the input image P₁(x,y) is set to thedefault background intensity or color value (See FIG. 21)

In step 4 above, anyone of a number of bimodal automated histogramthreshold selection techniques may be used. The bulk backgrounddifference from where both images have background will represent thefirst large uniform spike in histogram from the background having a lowmagnitude value followed by other peaks at higher values due to regionsin the image that come from the difference of the foreground andbackground objects. For example, a peak finding operator may be appliedto the histogram to identify all peaks (intensity values with smaller #of occurrence neighbors) and the threshold set between to the smallestpeak and the next largest peak (See FIG. 22).

It is often necessary to carry out a morphological dilation operation tosuppress small impulsive holes in the absolute difference mask and toextend the object boundaries for feathering smooth edges.

Matteing Process

This mask can then be logically ANDed with P₁ (the image with theforeground image) to form a resulting composited image with thebackground removed entirely or substituted with other image data ifdesired. By using an ANDING operation all non-foreground pixels aresuppressed, thus suppressing the background. In order to add in acomposited background, the logical complement of the selection mask(M₁′=logical inverse (M₁)) is used to select pixels which are from thebackground and may be substituted pixel-wise for pixels from the desirednew background image which can be a synthetic or natural scene imagefrom another source.

Soft Blending

The binary masking process can be generalized to a soft continuousblending of source images as follows.

In the preferred embodiment, image M₂ is formed which is the dilatedversion of the original mask image M₁. Then M₂ is logically exclusiveOR'd (XOR) with M₁ to form a shell boundary region mask as indicated inFIG. 23, to form the mask shell M₂′ The mask M₂ can be dilating yetagain to yield M₃ and the resulting shell mask M₃′ can be formed as M₃xor M₂. In general the shell mask for the nth iteration can defined asM_(n)′=M_(n) xor M_(n−1). For each shell mask, a blending coefficientα_(n) is associated in a table.

The blended image P_(b) results from the pixel-wise linear combinationof images P_(i) and P_(j). For each pixel in of all possible coordinatevalues x,y, the coordinate will be an element of one of the Mask shellsM₀, . . . , M_(n) or the background. In the case the coordinate x,y isan element of M_(n), then the corresponding blending coefficient an isselected. The blended image pixel is set asP_(b)(x,y)=□P_(i)(x,y)+(1−□)P_(j)(x,y), the linear combination of pixelvalues from the two source images.

A convenient way to set □_(n) is □_(n)=N/N_(max) where N_(max) is themaximum number of dilation iterations.

Optical Flow Segmentation Based Background Identification

Another approach for the automatic determination of the objectbackground is to use a optical flow thresholding technique. Thisapproach can be used in the case when the object having some visualpattern or texture, is placed on a textureless rotating platform infront of a fixed camera, and the background of the object is stationary.The background may be flat or textured, as long as it is stationarybetween acquisitions. In this case, the images space will have a staticbackground with only the object and its support surface (the rotatingstage) in motion. If the rotating platform is a flat featurelesssurface, although it is moving, it will not generate any signal that canbe picked up by the camera and will appear motionless.

Optical flow is defined as the spatial displacement of a small imagepatch or feature over a sequence of frames taken at different times,

$\left\langle {\frac{\mathbb{d}x}{\mathbb{d}t},\frac{\mathbb{d}y}{\mathbb{d}t}} \right\rangle.$There are number of techniques in the prior art for calculating thisvalue. Any one of a number of well known optical flow or motionextraction techniques can be used to operate on the sequence and computethe flow on a per-image basis. The flow can be computed using the frameof interest and adjacent frames, including the previous and/orsucceeding frame, or extended adjacent sequences. Alternatively, simpleimage differencing may be used between succeeding frames if morecomputational simplicity is needed. In this case, instead of adisplacement vector

$\left\langle {\frac{\mathbb{d}x}{\mathbb{d}t},\frac{\mathbb{d}y}{\mathbb{d}t}} \right\rangle,$a simple time derivative of image point x,y can be computed

$\frac{\mathbb{d}}{\mathbb{d}t}{I\left( {x,y} \right)}$and thresholded. The flow magnitude or time derivative is computed ateach image point x,y, and magnitude field is created. The flow field fora representative image is illustrated in FIG. 39.

It is possible to compute local optical flow fields, by taking frameswith only a small relative object rotation between each frame andcomputing the pixel-wise or patch wise local motion flow vector of thesequence. This can be done either with a camera, or through the use of avideo sequence. Since only the object of interest will be moving in thesequence, pixels belonging to it will have a much higher optic flowmagnitude, or time derivative, as they case may be. This constraint canbe used to identify all pixels belonging to the object of interest inthe sequence.

To summarize, for each image in the time sequence of images the steps inthe above approach are:

1. Compute optical flow or time derivative for each pixel of each imagein the image sequence. This will yield a flow vector (magnitude anddirection for each pixel).

2. Compute the magnitude for each pixel if the optical flow measure isused.

3. Threshold and label each pixel in the flow field with flow vectormagnitude greater than threshold Θ. The threshold can be establishedusing any one of a number of automated threshold detection techniqueswhich work with bi-modal value distributions. Alternatively, a fixedthreshold may be used.

4. This pixels selected as background can be used in the compositingprocess where every pixel at x,y marked as background in the matte maskselects pixels in the corresponding inserted background image atlocation x,y. The combined image will then contain the object in theforeground and the inserted artificial background from the compositedbackground image. The soft blending technique described herein isapplicable.

Alignment

In the case that a freehand sequence of shots are taken by walkingaround a fixed object camera motion may cause rotation of the desiredobject and non-uniform distance and camera pose may cause the object tomove in the composition of the acquired image sequence. In this case itis desirable to allow the person forming the multimedia 3D sequence toscale, rotate and translate the foreground object of interest (known asrectification of the image sequence), so that as the sequence is viewedin the complete multimedia program it presents in a more smooth form.

Visual Displays

The superposition and rectification sequence can be facilitated by anumber of visual displays, such as performing edge extraction on theimage sequence and superimposing adjoining or neighboring image pairs inthe sequence to allow for fast visual inspection of coinciding scale,rotation and translation.

Preview

A sliding preview can be used to step through the image sequence andrapidly detect outlying values of scale, rotation and translation. Asthe person creating the sequence sees the jumping outlier, the offendingframe may be marked for subsequent alignment.

Semi-Automated Alignment Using Affine Transforms

Easier to use Semi-automated approaches to registration can be carriedout by allowing the person carrying out the editing to selectcorresponding planar patches in adjoining images and the using geometricmatching techniques to correspond features in the regions and recoverthe affine transformations between the patches. The affine transform orportions thereof (such as the rotational, scale or translationalcomponents) can be used to rectify the images by ignoring the projective(perspective) components.

The Alignment Wizard

The goal of the advanced editing functionality is to allow the end-userto correct for any errors that occurred during the camera picture takingprocess, especially when a hand held camera was used to take the images.

Using a hand held camera can lead to errors in the centering,orientation and scale of the desired object. This can cause jumpinessand other discontinuities when the final image sequence is viewedinteractively. While it is not possible to correct out perfectly forthese errors, since that would require the full three dimensionalstructure of the scene, two dimensional operations on the resultingimages can be quite helpful. These problems can reduced to a greatextent by the application of image-space transformations including therotating images, scaling images and translating images so that they arerectified (aligned) to the best extent possible to reduce these effects.There are a number of approaches in principal for specifying whichscaling, translation and alignment operations, in which order and withwhat parameters. They range from approaches which are fully automated,to fully manual, to hybrids of the two. Additionally, any manualoperations inevitably involve judgment from the end-user. Therefore aneasy to use and intuitive tool set that guides the end-user in therectification process (an alignment wizard) is highly desirable. Below,we describe a design for an alignment wizard.

The overall functional steps for the wizard are as follows:

1. Rotational Rectification

2. Translational Rectification

3. Scaling Rectification

4. Autocrop

The user interfaces, actions and other displays functional requirementsare described in more detail below.

Rotational Rectification

Given two or more images in the sequence, each taken from a differentviewpoint, each image may have been taken with differing roll anglesabout the optical axis of the camera. This roll can cause an apparentrotation of the object of interest (See FIG. 25) for an example. Sincewe desire to have the object rotate about its natural axis of symmetry,or some approximation thereof, the first step is to indicate thelocation of this axis in the image space. This is done by using a linedrawing tool to draw a virtual axis of symmetry line in the image ofinterest superimposed on the image (See FIG. 26). Since this axis ofsymmetry is generally perpendicular to the floor, the system can nowcompute the angle of the indicated line and counter rotate the entireimage automatically so that the indicated axis of symmetry is parallelto the y-axis of the image frame, as illustrated in FIG. 27. Since arotation operation requires a natural center of rotation, about whichthe rotation takes place, this must be selected. This can be doneautomatically by using the assumption the photographer approximatelycentered the object when the photo was taken. In this case the mid-pointof the indicated axis of symmetry line is a good candidate for thecenter of rotation.

After the rotation, the image is also translated horizontally in thex-axis direction such that the virtual axis of symmetry is centeredlaterally in the preview image x-axis coordinate system.

The above process is carried out by the user for each constituent imagein the sequence and this completes the rotational rectification step.For clarity, a second input image (See FIG. 28) and resulting alignedimage is illustrated in FIG. 29.

Translational Rectification

Now that the images are approximately aligned from a rotationalstandpoint, the next step is to adjust for any vertical offsets betweenthe objects locations in the images (The horizontal offset is taken careof by the final lateral translation in the Rotational RectificationStep).

This is done using an animated jog effect, where the two images to berectified are alternatively double buffered and swapped automatically ona ¼ second interval, or value close to the flicker fusion frequency forhuman perception, which provides visual persistence of each image and atransparency effect where both images are effectively superimposed (SeeFIG. 30). A user interface mechanism (e.g. a slider oriented in theimage y-axis direction) is provided for each of the two respectiveimages to adjust the y-offset of the respective image. When the user issatisfied with the offset, a “done” button is hit to lock the alignment.The result is shown in FIG. 31.

This process is repeated for each consecutive pair of images in thesequence, if needed.

Scaling Rectification

Now that the images are approximately aligned and translated, the finalrectification step is to adjust for any variations in object scale thatmight have occurred due to variations in camera range to the objectduring the photo shoot.

This is done using an animated jog effect, where the two images to berectified are alternatively double buffered and swapped automatically onan approximate ¼ second interval (or value close to the flicker fusionfrequency for human perception), which provides visual persistence ofeach image and a transparency effect where both images are effectivelysuperimposed.

For scaling, an origin of the scale must be defined. The y-axis centerof the scaling is constrained to lie on the axis of symmetry line, whichleaves only the selection of the x-axis value for the center ofrotation. This location can be indicated by a sliding center point asindicated by a cross-hair in FIG. 32, which can be moved along thevirtual axis of symmetry line by the user using direct mousemanipulation. The aspect ratio is fixed for this scaling operation. Theresult is illustrated in FIG. 33.

Auto Crop

After the above sequence has been carried out on the entire sequence ina pair wise manner on adjacent images, the resulting sequence may haveodd borders and gaps in the image due to the applied rotation, scalingand translation operations. It is desirable to crop the images to aminimum inscribed rectangle which eliminates the odd perimeter and imagegaps. This can be done automatically in the following fashion.

First, the intersection of the current image areas is computedautomatically. This perimeter of this intersection is a convex polygon,as illustrated in FIG. 34, for two images While this illustration is fortwo images, the approach described here applies for more than one image.

The next step is to find an inscribed rectangle in this polygon. Aninscribed rectangle is illustrated in FIG. 35. They are a number ofpotential inscribed rectangles for any polygon, so one must be foundwhich maximizes any one of a number of possible criteria. We may chooseto maximize area, width, height, perimeter, or maximum symmetry to thevirtual axis of symmetry, for example. In this case we choose tomaximize area, as illustrated in FIG. 36. The entire sequence of imagesis cropped against this maximum area inscribed rectangle to yield acropped rectified sequence as illustrated in FIG. 37. Finally, it isdesirable to make the axis of symmetry centered in the entire sequence.This can be done by cropping the sequence again, such that the axis ofsymmetry is horizontally centered in the sequence, as illustrated inFIG. 38.

Determination of Center of Rotation

Alternatively, the center of the rotating platform may be marked and thevideo camera image can use a synthetic reticle down its center (verticalline which terminates at the visible center dot on the platform) toalign the center of the platform with the center of the optic axis ofthe camera. This is illustrated in FIG. 40. The object can then bepositioned using this synthetic reticle such that it rotates in asymmetric fashion in the image sequence.

Compression

One of the major problems to be overcome in order to make the use ofomni-directional viewing technology is long download times foromni-directional view sequences when a limited connection speed to overa communications network such as the Internet is used. In order forconsumers to avail themselves of the opportunity to browse and interactwith product using omni-views, a parsimonious and highly compresseddescription of the views is highly desirable. It is also necessary thatwhatever compression technology is used maintains the image qualitywhile decreasing the amount of time that it takes to download theobject. We describe a view sequence compression designed for a set ofomni-directional views that achieves compression by using redundantvisual information overlap from neighboring omni-directional views.

If only small changes in the actuators occur from frame to frame in theset of omni-directional views that are sampled, a large amount of sharedinformation may be present in the adjoining views. This sequence ofadjoining view digital images may be treated as a digital video sequenceand compressed using any one of a number of existing digital videocompression techniques and standards, such as MPEG-1, MPEG-2 or newerstandards such as MPEG-4. The system differs from these existingapproaches in the file is not encoded using a minimum of B frames. Thiscan be achieved since there are no large discontinuities since theobject is sampled from adjoining points in the view sphere. However,rather than treat the video sequence as real-time stream, the compressedsequence can be downloaded, the image sequence decompressed andreconstructed by a CODEC on the client. Once the original image sequencehas been reconstructed the image sequence can be cached on the browserclient and interactively controlled. This process is illustrated in FIG.41.

Furthermore, hyper-compression may be achieved by allowing the client tointerpolate between key-stored views using any one of a number oftechniques for image space morphing. In this case, the key views andmorphing parameters are transmitted to the media player, which then candynamically, render, or pre-render intermediate views and store them forfast viewing.

This sequence of images which tile the view sphere can be indexed usinga number of different tessellations of the view-sphere. For example aGeodesic tessellation or Cartesian tessellation may be employed asillustrated in FIG. 42 and FIG. 43. Each point on the tessellation canbe linked to its nearest neighbor view points, both in azimuth andelevation as well as zoom. By using on screen controls to allow the userto traverse this tessellation and thus the sequence of views the usermay be given the impression of interactively rotating the object aroundin three-dimensions and zooming in and out.

Exploitation of Human Motion Perception System Characteristics

Enhancements to the above system are possible to achieve even bettercompression at the expense of some viewpoint flexibility for the user.The perceptual capabilities of the human visual system are such that thespatial resolution for dynamic moving scenes is much less than that of astatic scene. This non-uniformity of resolution can be exploited byusing lower resolution sequences when the object is being dynamicallyrotated by the user and then selected a key frame (which has a key view)and is encoded at a higher resolution when the slider bar is released bythe user as illustrated in FIG. 17. This allows the users to moreclosely inspect the detail of the object in key views. Additionally,these key views may be encoded in a pyramid representation. Thus whenthe viewer applet detects that the slider bar is not moving for morethan a given timeout, the system downloads progressively higherresolution incremental pyramid representation layers for the given view.This pyramid representation can also allows for dynamic zooming intoareas of the object for closer inspection.

View Sphere Encodings

The sampling of the view sphere surrounding the object can be done usinga variety of regular constructions including a spherical coordinate gridmapping (See FIG. 42) or a Geodesic or other uniform tiling of thesphere (See FIG. 43). These grid mappings on the sphere are known as theview sphere. The spherical coordinate grid mapping can be unfolded andflattened into a view torus and each view indexed by an azimuth andelevation index i,j (See FIG. 44) or a vertex index (see FIG. 45) Thei,j th index indexes to the image acquired at the set of actuator valueswhich correspond to a camera view and optic axis of the camera topointing the origin of the sphere with the camera focal point at a givenlocation on the surface of the view sphere as illustrated in FIG. 42.

In the case of the spherical mapping, it is desirable the ordering ofthe views in the file sequence be ordered such that progressivedownloading of views is possible. For example, rotational views takenevery 90 degress can first be downloaded in a breadth first fashion,followed by the interposed 45 degree views, and the 27.5 degree viewsetc. This allows for a coarsely quantized (e.g. every 90 degrees) 360degree view set to be available rapidly and viewable before allintermediate views are downloaded and rendered by the viewer.

The advantage of the Geodesic triangulation is that it is a uniformingtiling of the sphere, which means the change in view is uniform for anychange in view index for neighboring view point, independent of currentview location, which is not the case with a latitude, longitudespherical coordinate tiling, and allows a good approximation to a greatcircle trajectory between any two points for smoother panning. Thisallows a more uniform views experience an predictable view change fortrajectories along the view sphere as compared to a simple Cartesianspherical or cylindrical coordinate mapping.

Each index in the above representations can be augmented with a thirdindex which represents a zoom factor which is equivalent to an effectiveoptical absolute distance of the camera to the object that is achievedby varying the focal length of the zoom lense. Thus a set of “viewshells” of view spheres can be indexed by a third index which specifiesthe shell being selected.

Additionally, each location can be augmented with camera pitch and yawoffsets, which can be integer or angular which allow for particularoffsets that allow the camera to fixate on portions of the object notcentered at the origin of the sphere.

Progressive Downloading

The sequence of images in the multimedia object sequence M can be ofprogressively higher resolution. It is convenient to use the GaussianPyramid Representation. Assume the N image are taking in rotationalsequence around the object with resolution 2^m by 2^m pixels. As mincreases by 1, the size of the image in pixels quadruples. Therefore itis desirable to first download the low possible resolution (m small,e.g. 6) then gradually increase m and download the higher resolutionpyramid coefficients and re-render the image, showing progressively moredetail. The first sequence can be displayed interactively and the imagesupdated in the background and swapped in as they finer detail imagesarrive. Since motion vision has lower spatial resolution than staticvision in humans, the viewer will be able to understand the 3D structureinitially and then as further details is desired at later temporalmoments, the higher resolution images will become available. Thisdescription is not meant to rule out other progressive downloadingtechniques such as those enabled by multi-scale wavelet or fractalencodings.

Miscellaneous

Enhanced Registration Between on-Line and Self-Contained Kiosk (PublicAccess)

Each item to be acquired must be entered and indexed into a database inthe Storage and Caching Server indicated in FIG. 2 in a registrationstep. Normally the user connects enters information regarding the indexand object specific descriptive information through the Host ApplicationServer indicated in FIG. 2. The user may need to enter descriptivetextual information regarding the type, quality, features and conditionof the object, which can take some time to type in. In the embodimentfor the Self-Contained Kiosk located in a public location, it isdesirable to avoid the carrying out of this registration at theSelf-contained scanner, since it could be a time-consuming process andcould lead to slow throughput and underutilization of the scanner.Because it may take a significant amount of time to register a givenobject by a user, it is desirable to carry out the registration processon another PC. This permits the user to take as much time as they need,without time pressure to carry out the registration of the item. Oncethis registration is complete, the user may utilize the public accessscanner solely for image acquisition, thus maximizing the availabilityof the system. However, registration of the item in one location andphotography in another leads to the need to link the particular databaseentry to the image sequence to be acquired. Each view sequence must beuniquely identified. As a database of view sequences grows larger, theeach identifier for a database record correspondence to a view sequencemust grow longer to maintain uniqueness as a primary database key (SeeFIG. 18). Unfortunately such long identifiers may be cumbersome toremember by users and to key in to the system by the person desiring toscan a new object in to the system. In particular, the unique identifiermay correspond to uniform resource locator which specifies the locationon the internet where the view sequence is located and may be viewed orlinked. With long sequence number and URL, the possibility that the userwill mis-type or forget the index increases. We describe a process whichdecreases this possibility and simplifies the process for the user.

In our system, it useful to facilitate the use of such a scanning systemin linking the objects to a Uniform Resource Locator URL, by use of abar-code which encodes the a unique identifying alphanumeric sequencewhich will link to the published scan location URL.

As FIG. 46 illustrates, using a subset of the elements indicated in FIG.2, an individual that desires perform image acquisition an object canconnect to the application server via a communications link (such as theInternet). The individual can connect to the scan-service's HostApplication Server and request a new unique identifier for an object.Optionally, the user may enter a textual description and title for theobject to be scanned. After this information is entered, a process atthe Host Application Server's site generates a digital representation ofthe bar-code which encodes the unique object identifier and sends thatrepresentation to the user's computer. They user may then print out thebar code using the user's printer hooked to the user's client computerto print out the bar-code as illustrated in FIG. 7.

This printed bar-code is then brought to the publicly situated scanningkiosk and scanned by a bar-code scanner which is part of the scanningkiosk as illustrated in FIG. 8. In FIG. 8, the user has brought theobject corresponding to the bar-code along with the printed bar-code toa location, such as a retail point of sale location in a copy center(e.g. Kinko's). The user places the object in the Object ViewAcquisition Kiosk. The printed bar-code is scanned and then the viewacquisition is activated. The kiosk acquires, compresses and formats theview sequence file and sends it over the communications link to theApplication server, which stores the sequence in the view sequencedatabase using the scanned unique object identifier as its retrievalkey. The user may review the quality of the view sequence using thepreview display available on the kiosk scanner before finalizing theview sequence in the database.

By using this approach, no typing is needed at the kiosk, since the dataentry can be carried out at another location, such as in the user'shome, using their home PC. This, increased the speed at which items canbe scanned, and maximizes the utilization of the machine, decreasing thewait when a queue forms at the machine. Additionally, since the userneed not key in information at the kiosk, there is nothing for them tomis-key at the kiosk—data entry can be done at the leisure of the useron their home PC—they only need bring the printed out bar codecorresponding to the item they are going to scan. This decreases theamount of time that the user must spend at the public scanner, whichmaximizes the availability and through put of the scanner.

Flash or other Formats

The use of Java based multimedia programs as an example in this documentis not meant to restrict the use of these techniques, other multimediaprogram formats such as Macromedia Flash Scripts or equivalent may beused.

Additional Multimedia Capability

Other types of dynamic multimedia image presentations that may begenerated using the above processes include rollover or hot spot basedzoom where a magnified image of a region may be activated by clicking ina highlighted zone in the image to reveal further detail about theobject, as well as additional textual information.

The same sequential image selection techniques may be used to animatethe function of objects, rather than to animate the rotation of objectsthrough the sequence of a set of images when step through thearticulation of a given object.

This is not meant to restrict the type of multimedia techniques whichmay be achieved with the herein mentioned processes and architecture.

Tracking of Utilization of Applets in Email

With the addition of a unique ID (such as a GUID or UUID) embedded ineach generated applet, described notationally as A(k,ID) in theencapsulated set {A(k,ID),E(K),S}, a system for the tracking of theutilization and effectiveness of the applet when embedded in amultimedia email may be accomplished. Each time the applet is executedon a client (a “view”) its unique ID can be sent back to a trackingserver which an correspond the Unique ID with the identity of a userthat was sent the message, or a pseudonym which persistently links auser ID to a person while maintaining stronger confidentiality. If totalanonymity is required by the respondents, the total number of appletviews may be tabulated to gauge the effectiveness and response rate ofthe media campaign.

In the formation of a mailing list, a table of correspondence betweenthe ID and the email recipient address may be formed which is used totrack the utilization and forwarding of the applet. In particular, theapplet may connect back with a particular tracking server whenever theapplet is activated and report the duration of viewing as well as anyinteractive events and durations which can be used to monitor theeffectiveness of a given multimedia presentation. In particular, httplinks may be embedded in the multimedia sequence and when activated, theselection of the particular events can be reported to the trackingserver to tabulate the overall response and escalation of interest ofthe particular viewing event. Secondly, by uniquely keying each applet,the tracking of forwarded emails is also possible, which can also beused to grade the effectiveness of a given campaign.

One Click View Linking

A view sequence enablement button may be added to a page in the merchantor auction web-site which describes the item for sale. By having anauthenticated and authorized user click that enablement button, aprocess executes on store front web site which lists the available viewsequences that are currently hosted and available to that user. The usercan select the appropriate view sequence. The process on the merchant'sweb site responds by adding the appropriate commands to the page whichlinks the view sequence and embeds it into the page automatically. Thisprocess is termed “one-click view linking.”

This “one click view linking” may be implemented in the followingmanner. The “click to link” button is a hyperlink to a given URL whichis parameterized by the subscribers name. The URL which is dynamicallycreated from the image database, contains a list of thumbnails for thegiven subscriber, as stored by the image sequence database. Each of thethumbnails is a hyperlink to a dynamically created hyperlink whichembeds the referring page name as a parameter. By clicking thehyperlink, a CGI script is instantiated which causes the subscriber hostto establish a connection message which indicates the referring pagewhich is to be updated with the URL of the desired sequence. The Targetupdates the link and acknowledges. After this acknowledgement, thecurrent page is auto-referred back to the original page having theone-click button.

Javascript Media Viewing Implementation

It may desirable to using a Javascript program on the Web Browser clientto render the multimedia sequence instead of using a Java applet due tothe fact that certain browsers may not support the Java language, or mayhave the language disabled as a result of the browser's configurationoptions. Normally, it is not possible to have a “slider” Graphical userinterface component controlling screen state without Java or ActiveXextensions to a browser. The following approach allows the simulation ofa slider component. FIG. 50 illustrates a web page layout with 2 imagedocument objects within a web browser, the View Image, which is used torender a particular image representing a particular view of the object,and the slider image, which is used to dynamically present the state ofthe slider control. A slider control may be simulated by pre-renderingof the slider in all possible positions, along with the set of Viewimages, which is illustrated in FIG. 51. A Javascript program embeddedin the HTML code for a web page may be used to establish an image mapwhich breaks the slider image into a set of areas. When the user's mouseis passed over each respective image map area, the appropriate view andslider images are dynamically loaded into their respective documentobjects, replacing the currently rendered images. As this occursdynamically, the effect is to animate smoothly the changing slider bar,and corresponding object views. A representative activation sequence isillustrated in FIG. 51 where the arrows from image map area point to theparticular images that are loaded into the View Image Document objectlocations, and Slider Image Document object locations respectively.While this figure illustrates this for 4 potential slider locations andcorresponding views, the approaches may be generalized for an arbitrarynumber of views by splitting the slider object into a set of image mapareas which evenly divide the image area width for the slider, and loadthe corresponding view image for that slider image area. FIG. 52 is alisting of Javascript source code which implements the diagram depictedin FIG. 51.

It is understood, therefore, that the present invention is susceptibleto many different variations and combinations and is not limited to thespecific embodiments shown in this application. The terms “server”,“computer”, “computer system” or “system” as used herein should bebroadly construed to include any device capable of receiving,transmitting and/or using information, including, without limitation, aprocessor, microprocessor or similar device, a personal computer such asa laptop, palm, PC, desktop or workstation, a network server, amainframe, and an electronic wired or wireless device. Further, aserver, computer, computer system, or system of the invention mayoperate in communication with other systems over any type of network,such as, for example, the Internet, an intranet, or an extranet, or mayoperate as a stand-alone system. In addition, it should be understoodthat each of the elements discloses all do not need to be provided in asingle embodiment, but rather can be provided in any desired combinationof elements where desired. It will also be appreciated that a system inaccordance with the invention can be constructed in whole or in partfrom special purpose hardware or from conventional general purposehardware or any combination thereof, any portion of which may becontrolled by a suitable program. Any program may in whole or in part becomprised of or be stored on a system in a conventional manner, orremain whole or in part be provided into the system over a network orother mechanism for transferring information in a conventional manner.Accordingly, it is understood that the above description of the presentinvention is susceptible to considerable modifications, changes, andadaptations by those skilled in the art and that such modifications,changes and adaptations are intended to be considered within the scopeof the present invention, which is set forth by the appended claims.

1. An image processing method for identifying figure and background forthe purpose of matteing and compositing, wherein two images are input,one with a foreground object and the other without, and the backgroundareas are taken under similar scene illumination, the method comprisingthe steps of: computing the per pixel gray level absolute difference inintensity or vector color magnitude image difference betweencorresponding pixels in the two images; selecting those pixels locationswhich are above a relatively small threshold in terms of gray leveldifference or vector color magnitude difference to form a mask whichselects only foreground object pixels locations.
 2. The method of claim1, wherein the masks are combined via a logical “OR” operation togenerate a combined foreground object selection mask.
 3. The method ofclaim 1, wherein features are identified and corresponded between twoframes and affine transform components are used to align the two frames.4. The method of claim 1, wherein an alignment wizard consisting ofrotational, translation and scaling visual displays and GUIs are used toguide and assist a user in rectifying a media sequence of individualcaptured images.
 5. The method of claim 4, wherein the media sequence isfurther processed to automatically crop for a maximum inscribedrectangle in the sequence, and the maximum inscribed rectangle isfurther centered around an indicated axis of symmetry for an object ofinterest.
 6. The method of claim 1, wherein a unique database primarykey corresponding to an object to be acquired is generated on a homepersonal computer, a bar coded encoding of the unique database primarykey is printed on the home personal computer, the print out is broughtto a self-contained view acquisition unit vending system and the barcode scanned, to avoid the re-keying of that unique database primarykey.
 7. An image editing method for identifying figure and backgroundfor the purpose of matting and compositing an object having a visualpattern or texture, which is placed on a textureless rotating platformin front of a fixed camera, and the background of the object isstationary and two or more images are captured, the method comprisingthe steps of: computing optical flow or time derivative for each pixelof each image in the image sequence, for yielding a flow vector having amagnitude and direction for each pixel; computing a magnitude for eachpixel if the optical flow measure is used; threshold and label eachpixel in the flow field with flow vector magnitude greater thanthreshold Θ; and selecting pixels which are above a relatively smallthreshold in terms of optical flow magnitude, to form a mask whichselects only foreground object pixels.
 8. The method of claim 7, whereinthe masks are combined via a logical “OR” operation to generate acombined foreground object selection mask.
 9. The method of claim 7,wherein features are identified and corresponded between two frames andaffine transform components are used to align the two frames.
 10. Themethod of claim 7, wherein an alignment wizard consisting of rotational,translation and scaling visual displays and GUIs are used to guide andassist a user in rectifying a media sequence of individual capturedimages.
 11. The method of claim 10, wherein the media sequence isfurther processed to automatically crop for a maximum inscribedrectangle in the sequence, and the maximum inscribed rectangle isfurther centered around an indicated axis of symmetry for an object ofinterest.
 12. The method of claim 7, wherein a unique database primarykey corresponding to an object to be acquired is generated on a homepersonal computer, a bar coded encoding of the unique database primarykey is printed on the home personal computer, the print out is broughtto a self-contained view acquisition unit vending system and the barcode scanned, to avoid the re-keying of that unique database primarykey.